Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 24639620 |
| Missing cells | 355922 |
| Missing cells (%) | 0.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.5 GiB |
| Average record size in memory | 64.0 B |
Variable types
| CAT | 5 |
|---|---|
| NUM | 2 |
| DATE | 1 |
Componente is highly correlated with Variable | High correlation |
Variable is highly correlated with Componente | High correlation |
Tipo is highly correlated with Ponderación | High correlation |
Ponderación is highly correlated with Tipo | High correlation |
Valor is highly skewed (γ1 = -519.2066934) | Skewed |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2020-11-02 01:49:56.045692 |
|---|---|
| Analysis finished | 2020-11-02 01:57:06.984076 |
| Duration | 7 minutes and 10.94 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 24639620 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24635388.33 |
|---|---|
| Minimum | 0 |
| Maximum | 49279236 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 188.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2460560.95 |
| Q1 | 12315292 |
| median | 24634342.5 |
| Q3 | 36954650.25 |
| 95-th percentile | 46814348.05 |
| Maximum | 49279236 |
| Range | 49279236 |
| Interquartile range (IQR) | 24639358.25 |
Descriptive statistics
| Standard deviation | 14226109.26 |
|---|---|
| Coefficient of variation (CV) | 0.5774664101 |
| Kurtosis | -1.199992716 |
| Mean | 24635388.33 |
| Median Absolute Deviation (MAD) | 12319661.5 |
| Skewness | 0.0002424848105 |
| Sum | 6.07006607e+14 |
| Variance | 2.023821846e+14 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 25169918 | 1 | < 0.1% | |
| 48679309 | 1 | < 0.1% | |
| 27697536 | 1 | < 0.1% | |
| 31893889 | 1 | < 0.1% | |
| 19313026 | 1 | < 0.1% | |
| 1811456 | 1 | < 0.1% | |
| 48662917 | 1 | < 0.1% | |
| 2527622 | 1 | < 0.1% | |
| 33774481 | 1 | < 0.1% | |
| 23525771 | 1 | < 0.1% | |
| Other values (24639610) | 24639610 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 7 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 49279236 | 1 | < 0.1% | |
| 49279233 | 1 | < 0.1% | |
| 49279232 | 1 | < 0.1% | |
| 49279231 | 1 | < 0.1% | |
| 49279229 | 1 | < 0.1% |
Estación
Categorical
| Distinct | 30 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.0 MiB |
| CAI Venecia | 1495739 |
|---|---|
| CAI 20 de Julio | 1474771 |
| Estación Monitoreo Ruido Inteligente 7 | 1410095 |
| Edificio Marly | 1159797 |
| CAI Villa Nidia | 1127938 |
| Other values (25) |
| Value | Count | Frequency (%) | |
| CAI Venecia | 1495739 | 6.1% | |
| CAI 20 de Julio | 1474771 | 6.0% | |
| Estación Monitoreo Ruido Inteligente 7 | 1410095 | 5.7% | |
| Edificio Marly | 1159797 | 4.7% | |
| CAI Villa Nidia | 1127938 | 4.6% | |
| CAI Claret | 1105881 | 4.5% | |
| Santa Cecilia | 1105296 | 4.5% | |
| CAI Alamos | 1103731 | 4.5% | |
| CAI Normandia | 1088263 | 4.4% | |
| CAI Rincon | 1085760 | 4.4% | |
| Other values (20) | 12482349 | 50.7% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 39 |
|---|---|
| Median length | 12 |
| Mean length | 14.26484702 |
| Min length | 3 |
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.0 MiB |
| Leq | |
|---|---|
| Lmin | |
| L10 | |
| Lmax | |
| L90 | |
| Other values (7) |
| Value | Count | Frequency (%) | |
| Leq | 4908126 | 19.9% | |
| Lmin | 3957737 | 16.1% | |
| L10 | 3957165 | 16.1% | |
| Lmax | 3956875 | 16.1% | |
| L90 | 3956054 | 16.1% | |
| L50 | 3725702 | 15.1% | |
| Velocidad del Viento | 29811 | 0.1% | |
| Temperatura Ambiente | 29742 | 0.1% | |
| Dirección del Viento | 29739 | 0.1% | |
| Humedad Relativa | 29696 | 0.1% | |
| Other values (2) | 58973 | 0.2% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 20 |
|---|---|
| Median length | 3 |
| Mean length | 3.429623833 |
| Min length | 3 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.0 MiB |
| Ruido | |
|---|---|
| Meteorología | 177961 |
| Value | Count | Frequency (%) | |
| Ruido | 24461659 | 99.3% | |
| Meteorología | 177961 | 0.7% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 12 |
|---|---|
| Median length | 5 |
| Mean length | 5.050557882 |
| Min length | 5 |
Fecha
Date
| Distinct | 98402 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.0 MiB |
| Minimum | 2017-06-30 08:00:00 |
|---|---|
| Maximum | 2020-08-07 23:00:58 |
Histogram with fixed size bins (bins=50)
| Distinct | 1267260 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.85356688 |
|---|---|
| Minimum | -180819 |
| Maximum | 15971.49995 |
| Zeros | 67366 |
| Zeros (%) | 0.3% |
| Memory size | 188.0 MiB |
Quantile statistics
| Minimum | -180819 |
|---|---|
| 5-th percentile | 15.8 |
| Q1 | 41.2 |
| median | 53.3 |
| Q3 | 63.97430023 |
| 95-th percentile | 82.40000153 |
| Maximum | 15971.49995 |
| Range | 196790.5 |
| Interquartile range (IQR) | 22.77430023 |
Descriptive statistics
| Standard deviation | 326.0765831 |
|---|---|
| Coefficient of variation (CV) | 6.540687127 |
| Kurtosis | 284395.7524 |
| Mean | 49.85356688 |
| Median Absolute Deviation (MAD) | 11.3 |
| Skewness | -519.2066934 |
| Sum | 1228372943 |
| Variance | 106325.938 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 67366 | 0.3% | |
| -999 | 59777 | 0.2% | |
| 57 | 58474 | 0.2% | |
| 56 | 58440 | 0.2% | |
| 56.5 | 58438 | 0.2% | |
| 55.5 | 58221 | 0.2% | |
| 59.5 | 57821 | 0.2% | |
| 55 | 57732 | 0.2% | |
| 54.5 | 57664 | 0.2% | |
| 56.3 | 57490 | 0.2% | |
| Other values (1267250) | 24048197 | 97.6% |
| Value | Count | Frequency (%) | |
| -180819 | 59 | < 0.1% | |
| -179820 | 12 | < 0.1% | |
| -173826 | 1 | < 0.1% | |
| -166833 | 1 | < 0.1% | |
| -158841 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 15971.49995 | 1 | < 0.1% | |
| 14073.40002 | 1 | < 0.1% | |
| 11243.80003 | 1 | < 0.1% | |
| 10497.20003 | 1 | < 0.1% | |
| 10403.60003 | 1 | < 0.1% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 177961 |
| Missing (%) | 0.7% |
| Memory size | 188.0 MiB |
| Lin | |
|---|---|
| A | 1920324 |
| C | 740910 |
| Value | Count | Frequency (%) | |
| Lin | 21800425 | 88.5% | |
| A | 1920324 | 7.8% | |
| C | 740910 | 3.0% | |
| (Missing) | 177961 | 0.7% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 2.783987415 |
| Min length | 1 |
| Distinct | 40 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 177961 |
| Missing (%) | 0.7% |
| Memory size | 188.0 MiB |
| Leq | 874096 |
|---|---|
| Impulso | 867638 |
| Pico | 740946 |
| 1/3 Oct 10kHz | 678502 |
| 1/3 Oct 100Hz | 678485 |
| Other values (35) |
| Value | Count | Frequency (%) | |
| Leq | 874096 | 3.5% | |
| Impulso | 867638 | 3.5% | |
| Pico | 740946 | 3.0% | |
| 1/3 Oct 10kHz | 678502 | 2.8% | |
| 1/3 Oct 100Hz | 678485 | 2.8% | |
| 1/3 Oct 25Hz | 678311 | 2.8% | |
| 1/3 Oct 6.3kHz | 678282 | 2.8% | |
| 1/3 Oct 315Hz | 678255 | 2.8% | |
| 1/3 Oct 20kHz | 678159 | 2.8% | |
| 1/3 Oct 12.5kHz | 678126 | 2.8% | |
| Other values (30) | 17230859 | 69.9% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 15 |
|---|---|
| Median length | 13 |
| Mean length | 11.96864176 |
| Min length | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | Estación | Variable | Componente | Fecha | Valor | Ponderación | Tipo | |
|---|---|---|---|---|---|---|---|---|
| 0 | 42292150 | CAI Quirigua | Leq | Ruido | 2020-02-25 05:01:03 | 51.70 | Lin | 1/3 Oct 4kHz |
| 1 | 42388519 | CAI Quirigua | Leq | Ruido | 2020-03-15 03:00:58 | 56.20 | Lin | 1/3 Oct 250Hz |
| 2 | 43466573 | CAI Rincon | L50 | Ruido | 2019-07-20 16:00:00 | 61.70 | Lin | 1/3 Oct 125Hz |
| 3 | 15421732 | Edificio Marly | L90 | Ruido | 2019-07-26 02:00:00 | 71.30 | C | Pico |
| 4 | 29309660 | CAI Americas | Lmax | Ruido | 2019-08-16 19:00:00 | 78.20 | Lin | 1/3 Oct 1.25kHz |
| 5 | 43249386 | CAI Rincon | L90 | Ruido | 2019-06-06 03:00:00 | 34.50 | Lin | 1/3 Oct 250Hz |
| 6 | 40083635 | CAI Normandia | L90 | Ruido | 2019-12-18 10:00:31 | 64.80 | Lin | 1/3 Oct 40Hz |
| 7 | 40720762 | CAI Normandia | L50 | Ruido | 2020-05-01 21:00:46 | 52.70 | Lin | 1/3 Oct 400Hz |
| 8 | 8213734 | CAI Tejar | L90 | Ruido | 2020-06-04 20:00:31 | 55.20 | Lin | 1/3 Oct 40Hz |
| 9 | 7908828 | CAI Tejar | L10 | Ruido | 2020-04-04 11:00:31 | 61.10 | Lin | 1/3 Oct 200Hz |
Last rows
| df_index | Estación | Variable | Componente | Fecha | Valor | Ponderación | Tipo | |
|---|---|---|---|---|---|---|---|---|
| 24639610 | 23804100 | Santa Cecilia | L10 | Ruido | 2019-10-18 22:00:00 | 57.60 | Lin | 1/3 Oct 25Hz |
| 24639611 | 20723228 | Hotel Morrison | Lmin | Ruido | 2020-05-04 23:00:31 | 20.70 | Lin | 1/3 Oct 3.15kHz |
| 24639612 | 37100377 | CAI Las Ferias | L10 | Ruido | 2020-03-08 06:00:58 | 61.20 | Lin | 1/3 Oct 1.25kHz |
| 24639613 | 1963843 | Estación Monitoreo Ruido Inteligente 7 | Leq | Ruido | 2020-05-15 12:49:17 | 51.95 | Lin | 1/3 Oct 6.3Hz |
| 24639614 | 10566042 | CAI Venecia | L90 | Ruido | 2020-01-30 02:00:58 | 27.90 | Lin | 1/3 Oct 2kHz |
| 24639615 | 316098 | Estación Monitoreo Ruido Inteligente 13 | Leq | Ruido | 2020-03-22 19:22:17 | 14.97 | Lin | 1/3 Oct 10kHz |
| 24639616 | 46589276 | CAI San Victorino | L90 | Ruido | 2019-06-08 08:00:00 | 67.10 | A | Leq |
| 24639617 | 9580084 | CAI Venecia | L10 | Ruido | 2019-09-07 07:00:00 | 66.60 | Lin | 1/3 Oct 200Hz |
| 24639618 | 4711404 | CAI 20 de Julio | Lmin | Ruido | 2019-08-13 00:00:00 | 32.20 | Lin | 1/3 Oct 1.25kHz |
| 24639619 | 13130449 | CAI Villa Nidia | Leq | Ruido | 2019-06-28 04:00:00 | 54.80 | Lin | 1/3 Oct 315Hz |